Modeling of Inhibition of Tetrahymena pyriformis growth by Aliphatic Alcohols and Amines pollution of l’ environmental
Fatiha Mebarki1, Souhaila Meneceur2, Nadia Ziani3,5, Khadidja Amirat4,5, Abderrhmane Bouafia2*
1Faculty of Science and Technology, Department of material sciences, Amine Elokhal Elhamaterial Sciences,
Amine Elokkal El hadj Moussa Eg Akhamouk University-Tamanrasset,11000, Algeria.
2Department of Process Engineering and Petrochemistry, Faculty of Technology,
University of El Oued, 39000 El-Oued, Algeria.
3Faculty of Science, Chemistry Department Badji Mokhtar University Annaba, Annaba, Algeria.
4Faculty of Science, Department of Chemistry University of Sétif 1 - Ferhat Abbas, El Bez,
Setif 19000 Tamanrasset, Algeria.
5Renewable Energy Development Unit in Arid Zones (UDERZA), University of El Oued El-Oued, Algeria.
*Corresponding Author E-mail: abdelrahmanebouafia@gmail.com
ABSTRACT
To
assess the relative toxicity of a mixed series of 21(linear and branched-chain)
alcohols and 9 normal aliphatic amines in terms of the 50% inhibitory growth concentration
(IGC50) of Tetrahymena Pyriformis, a Quantitative Modeling study know as a Structure-Activity/property/Toxicity
Relationship (QSAR/QSPR/QSTR) was conducted (20 training,10 tests). The used least
squares LS method has been using MINITAB 16 Software and nom-parametric estimation
(least absolute deviation LAD) (robust regression method) has been using Calculation
Programs by MATLAB Software. The applied simple linear regression approach is based
on theoretical H4p (GETAWAY descriptor) molecular descriptor
from DRAGON software The performance of regression is better if the distribution
of errors has normal, in this case we use the least squares LS method for statistical
analysis. When the data does not have a natural assumption, we move to another method
of analysis that is more robust and more frequent for the presence of the points
of articulation, which is the least absolut deviation method (LAD). The findings
of statistical analysis for the chosen model (QSAR) using simple linear Regression
using the least Squares Method were =97.24%,
S=0.248 Anderson Darling (AD) test =1.57
, symmetry
coefficient (ou skeweness) (sk= 2.14
) , flatness coefficient (Kurtosis)
and Jarque
and Bera Test (JB= 42.84
42. the results did not follow the normal law
(unnormal). The coefficient of determination and the value of standard deviation
are both highly sensitive to the presence of aberrant compounds(abnormales), as
the
value moved
from 87, 96 % to 94.18 %, which increased by a value of 6.22% and the value of standard
deviation (S) moved from 0.399 to 0.303, it increased by a value of 25 % after removing
aberrant compound (abnormalie) are interpreted as better adjustment and they are
positively. After removing the aberrant compound, we did not see any change in the
lines coefficients, indicatting that the function’s graph is stable, demonstrating
the LAD method and increased power, which are unaffected by the presence of aberrant
compounds Consequently, which means that the model of one descriptor selected is
good and statistically strong, Three influential compounds detected ((one compound
of training, two compounds of Test) and important the model and absence of studied
sample aberrants compounds.
Keywords: Alcohols and Amines, Aquatic toxicity, Simple linear regression, least squares, Least Absolute Deviation Regression.
INTRODUCTION:
The occurrence and determination of amine have drawn a lot of interest in recent years as environmental issues and global environmental change are gamering an ever -increasing amount of attention worldwide. These amines are a major cause of social and sanitary issue and can be found in a variety of ambient situations, including air, water, soil, and food.
The toxic effects of nitro aromatics, which are dangerous substances, include skin hypersensitivity immunotoxicity, germ cell degeneration, inhibition of liver enzymes, and a speculative carcinogenicity. The lack of experimental data made it difficult to model the toxicity of nitroaromatic chemical. Because they are often utilized in industry, nitrobenzene’s (NBs) have a significant potential to pollute the environment. They have even been found in surface waters 1.
There are models for structure-toxicity that include biology, chemistry and statistics. The intersection of these three topics has made it possible for structure-activity relationships to become a recognized specialty to toxicology.
In order to forecast toxicity for both new and current compounds, quantitative structure-activity relationships (QSARs) will be used more frequently over the coming ten years. The utilization of these methods to lessen or eliminate the use of animals in toxicological testing for the regulation of current chemicals will receive a lot of attention (e.g.in the REACH legislation) 2. The publication of paper in 1962 3 that demonstrated a relationship between biological activity and octanol-water partition coefficient is regarded as the official birthdate of QSAR4. To assess the relative toxicity of organic chemicals in terms (of tge logarithm of inverse) of the 50 % inhibitory growth concentration (IGC50) of Tetrahymena pyriformis, predicted simple linear regression QSAR models are proposed in the current paper.
The average is the best measurement of central tendency, the standard deviation is the best measurement of dispersion and the method of least squares (L% estimate) is the best method of regression if the values of the independent variables are known with precision and the errors in the observations of the dependent variable are distributed normally.
This is particularly when the dependent variable includes aberrant data or observations that are significantly disconnected from the totality of the data.
Regression using the least squares (LS) approach performs best when the distribution of errors is assumed to follow a normal distribution. Least Squares LS method cannot be used to estimate the regression parameters when the data do not meet the normality assumptions due to the existence of outliers and/or multicollinearity since the estimation error of the parameters can rise 5.
It makes it possible to call into question several aspects of a problem to change the model complement as in my work.
The objectives in this study Choosing the best analysis method for simple linear model when one is confronted with a major problem in analysis of regression, namely the problem of not normal distributed or normal distributed but to disturb contains aberrant data. was conducted on 30 compounds (a mixed series of 21 (linear and branched -chain) alcohols and 9 normal aliphatic amines using the 50% inhibitory growth concentration (IGC50) of Tetrahymena pyriformis devised into a (20 training,10 test).
And based check it the normal distribution of errors on Test Anderson Darling (AD) (the normal law), a coefficient of symmetry (ou skeweness) and a coefficient of flatness (kurtosis) as well as the coefficient of skewness and the coefficient of kurtosis give the test of Jarque and Bera (the normal law) which resulted in an abnormal distribution of errors, which elimanates the use of the least square method. and with the method of Least Absolut Deviation. We will check normality distributed by the goodness-of-fit, the checking of the aberrant data using line plot. Finally, the value of the standard error and the coefficient of determination was relied on in determining the quality of adjustment for points after removed the abnormal (aberrant value)
MATERIALS AND METHODS:
DATA SET:
9 normal aliphatic amines and a set of 21 (linear and branched-chain) alcohols that were chosen to represent a range in chain length and branching were used to evaluate the toxicity.
The most studied common freshwater hymenotomy ciliate, Tetrahymena pyriformis, which measures roughly 50 µm in length and 30 µm in width, is inhibited by this toxicity, which is both nonionic and nonreactive.
The ciliates were cultivated in axenic culture and after 48h hours of incubation, the population density was determined spectrophotometrically as optically as optical density (absorbance) at 540 nm Schultz provided the experimental data set.
Descriptor Generation:
Each compound’s chemical structure was doodled on a computer using the Hyperchem program 9 and pre- optimized using MM+ molecular mechanics method (Polack-Ribiere algorithm)
By using the semi-empirical PM3 method at a constrained Hartree-Fock level with no configuration interaction and a gradient norm limit of 0.01 kcal.A-1.mol-1, the minimum energy conformation’s final geometries were determined is used as a halting point. Using the DRAGON software (version 5.3) 10 The generated served as the input for the production of (74),3D geometrical descriptors.
Geometrical descriptors that are defined from a molecule’s three-dimensional structure, which requires knowledge of the atoms ‘reactive positions in 3D, offer information and ability to distinguish between molecular structures and molecule conformations.
In the MOBYDIGS release of Todischini 11 the sub-descriptor
sets were chosen using genetic algorithm while using the calibration data to maximizing
the prediction coefficient
Statistical analysis:
Simple linear regression:
In this study work, it was relied on mathematical computer software through one -variable equation or liar equation of the first order in their general form 11-12:
F(x)= a + bx
(1)
: Direction-finding
factor (constant of regression):
(2)
: Standard
deviation of y value
=
(3)
: Standard
deviation of X value
=
(4)
: intersection
point of the graph with the ordinal axis (coefficient of regression).
(5)
: Mean of
Y value:
=
(6)
: Mean
of X value:
(7)
By the Least Squares LS method:
Oldest and simple method of linear regression. Under certain condition 11, the most important condition will be seen under this study. Its principle is based on the problem of reducing the sum of differences squares between the expected and actual values.
Statistical analysis of the model based on
the following factors: (
where,
t is student value of et ddl
Enternal Training Criteria:
are determination coefficients 11-12:
(8)
(Calculated
according to equation 2) of the model 11
(9)
S is the standard deviation 11-12
(10)
is Adjusted
11-12:
(11)
The P-value is the probability of obtaining a test statistic that is at least as extreme as the actual calculated value, if the null Hypothesis is true.
External validation criteria:
The
external that is
defined:
(12)
Where and
are the
dependent variable’s measured and predicted value (across the prediction set) and
the dependent
variable’s averaged value for the training set,
and
are the
number of training set and the number in the external set.
The model needs to demonstrate:
Normality of residus:
For the tests of this to be valid the distribution used must be normal and not skew the interpretation of the typical forecast error.
Anderson Darling Test (AD)12:
The test of Anderson -Darling is alternative to the Kolmogorov-Smimov test, with the difference that it places more emphasis on the tails of the distribution.
A normal law as having a symmetry coefficient
(ou skeweness) 0, and
a flatness coefficient (kurtosis)
3 of respectively.
The Jarque and Bera Test 12:
(1984) which is based on the concepts of Skewness
(asymmetry, which denotes the fact that the normal law is a symmetrical law) and
kurtosis (flatness, which indicates to us the degree of skewness of the distribution
tails), allow us to determine whether a statistical distribution with two degrees
of freedom is normal.
By The Least Absolut Deviation (LAD) Method:
Since the method of least squares places heavy weights on the major terms of error, we turn to an alternative, estimator more robust, that minimize the absolute values and not the values with the square of the term of error. The absolute deviation (LAD) of the estimator, suggested by Gauss and Laplace belongs to the family of the quantile’s estimators. This method, does not put an excessive weight on very divergent observations, like least squares and thus produced more robust estimators compared to the aberrant values.
Enternal Validation Criteria:
The goodness -of -fit was evaluated by coefficient of determination 15:
R2=1-
(13)
and standard deviation 15:
S= .
(14)
(15)
Being the
actual value and
being the
computed value with the method stable-LAD. The techniques of cross validation were
applied for the evaluation of the interval prediction (
; bootstrap)
and of the robustness (
Y-scrambling)
of the model.
Validation crossed by “leave -one-out” (LOO)
15. consists in recomputing the model on (n-1) objects and using the
obtained model to predict the value of the variable dependent on the isolated compound.
The process is repeated for each N objects of the whole of test. The sum of the
absolute values of the errors of prediction (indicated by acronym PRESS, for Predictive
Residual LAD) is a measurement of the dispersion of the estimates. It is used to
define the coefficient of prediction ( and the
everage standard deviation of predictive (or EQMP) 15:
(16)
EQMP=
(17)
is the
sum of the absolute values total;
indicate
the response of the
object
estimated by using an obtained model without utilizing this
object
and med the value median of N observation; the summation runs on the whole of training
compounds.
A value is regarded
as satisfactory, a value
is excellent.
In fact, if a strong value of
is a condition
necessary of a possible high predictive capacity of a model, this condition alone
is not sufficient .
External validation criteria:
The equation (12) allows the
calculation of 15:
=1-
(18)
The data set randomly was
divided into a training set (20 objects) used to develop the QSAR models and a validation
set (10 objects),used only for statistical external validation the parameter is also
useful. We calculate it according to:
=
(19)
The bearing sum on the objects of the whole of validation
The bearing sum on the objects of the whole
of validation ().
The applicability was discussed the diagram of Williams (treated in detail in7-8, representing the standardized residues of prediction:
R i(lad) = (20)
According to the values of
the levers hi. The equation (21) defines the lever of a compound in the original
space of the independent variables .
(xi): H=
( 21)
Where are the
vector line of the descriptors of compound (i) and the X (n
) matrix
of the model deduced from the values of the descriptors of the whole of calibration;
the index T indicates the vector (or stramps it) transposed (E).
The breaking value of the
level ( is fixed
at (2p+1) /n. If hi
, the probability
of agreement between the values measured and predicted compound I is as high as
that of composed of calibration. The compounds with hi
reinforce
the model when they belong to the whole of calibration; but we will have, doubtful
values predicted without being inevitably aberrant, the residues are able to be
low.
RESULTS AND DISCUSSION:
Study and numerical application:
More particularly we will test two methods
of estimate for the vector of the
parameters (
).
Method of ordinary least squares, most known and the most used (Under certain condition if the fundamental distribution of the errors is normal, but if the errors are not really Gaussian and can include aberrant values it is preferable, we turn to an alternative robust, that minimizes the absolute values and not the square of error.
Least Squares:
We used in this work the method of least squares which ‘was largely’ studied. The data set randomly was divided into a training set (20 objects) used to develop the QSAR models and a validation set (10 objects), used only for statistical external validation
The definition of each descriptor is given Table 1:
Table 1: Definitions of descriptors used in the toxicity data prediction model.
Descriptors |
The définition |
H4p |
H autocorrelation of lag 4/weighted by atomic polarizabilities. |
The best models:
(-LogIGC50)
(H4P): S=0.248, =97.39%,
n=20 compounds.
The H4P descriptor (H Autocorrelation of Lag 4/weighted by atomic polarizabilities) encode information on structural fragments and therefore seem to be particularly suitable for describing differences in congeneric series of molecules 10.
A being the number of molecule atoms. Table 2 lists the Cas number -LogIGC50
Table 2: Toxicity values for the selected aliphatic alcohols and amines.
Numbers of compounds |
Compound |
-LogIGC50 |
1 |
Méthanol |
--2.77 |
2 |
Ethanol |
-2.41 |
1-propanol |
-1.84 |
|
4 |
1-pentanol |
-1.12 |
5 |
l1-hexanol |
-0.47 |
6 |
1-heptanol |
0.02 |
7 |
1-nonanol |
0.77 |
8 |
1-decalnol |
1.1 |
9 |
2.07 |
|
10 |
2.28 |
|
11 |
2-propanol |
-1.99 |
12 |
2-methyl-1-butanol |
-1.13 |
13 |
3-methyl-1-butanol |
-1.13 |
14 |
3-methyl-2-butanol |
-1.08 |
15 |
(tert) pentanol |
-1.27 |
16 |
-0.85 |
|
17 |
1-hexylamine |
-0.34 |
18 |
1-heptylamine |
0.1 |
19 |
1-octylamine |
0.51 |
20 |
2.26 |
|
21 |
1-butanol* |
-1.52 |
22 |
Lotanol* |
0.5 |
23 |
1.87 |
|
24 |
2-pentanol* |
-1.25 |
25 |
3-pentanol* |
-1.33 |
26 |
(neo)pentanol* |
-0.96 |
27 |
1-butylamine* |
-0.7 |
28 |
1-anylamine* |
-0.61 |
29 |
1.59 |
|
30 |
1.95 |
The diagnostic statistics joined together in Table 3 make it possible to make comparaisons and to draw Several conclusions 11.
Table 3: Diagnostic Statistical sample.
Size |
Mod els |
|
|
|
|
|
|
1 |
H4p |
97.39 |
96.69 |
96.24 |
93.91 |
97.24 |
0.00 |
|
SDEPS |
SDEC |
F |
s |
|||
98.68 |
0.265 |
0.236 |
670.90 |
0.248 |
Values of attest the
good fitting performances of the model which, moreover, is very highly significant
(great value of the F).
The small difference between and
(=0.70%)
and the small difference between
and
(=0.45%)
information about the robustness of the model is further highly significant (high
value of the statistic Fisher F).
The close values of SDEC and SDEP mean that the ability of the internal prediction of model is not too dissimilar to his adjustment power.
External statistical validation attest
to the good predictive ability of the compounds did not participate in the calculation
model.
From the statistics results that models studied
are the best when the standard deviation S and
The model based on one descriptor is for equation using the Minitab 16 software (Table 4):
Y=-2.49+9.65*H4p (22)
Table 4: least squares estimate for model.
Predictor |
Coef |
SE Coef |
T |
P |
Constant |
-2.4981 |
0.09934 |
-25.15 |
0.000 |
H4p |
9.6565 |
0.3728 |
25.90 |
0.000 |
The tests of student make
it possible to conclude with a risk of error from first species of that the parameters all are significant. Their estimates are
=-2.498
and
=9.656 we
will check these assumptions graphically to check the normal distribution of errors:
Normality of the errors:
We observe from graphic the distribution of the residues is disturbed (Fig. 1) we rely on the following tests:
Anderson Darling Test (AD)=1.57 5 (fig 1 and 2) which explains the disturbance of distribution
indicating that
distribution is not compatibility with the normality law (not normal).
(Fig.2) has a negative asymmetry
distribution (left asymmetry) from skewness coefficient sk=2.14 and kurtosis coefficient (ku=-5.75
) also skewness
coefficient with the kurtosis coefficient give Jarque and Bera Test which is considered
among the normal distribution Test of errors but of the second degree of freedom
(42.84=n
Through previous tests, it was found that the distribution of errors is abnormal.
Fig .1. Probability plots of errors.
Fig.2 plot of summary for residues (training, test).
The goodness- of -fit:
We note from the statement that all points on the line except for some points are abnormal points (aberrant value) y=f(x).
It is noted that the fundamental distribution of errors is not -normal by least squares can include aberrant values, it is preferable we turn to an alternative, more robust, that minimizes the absolute values of errors (Fig.3).
Fig. 3: Log IGC50 value vs Ha4p descriptor value.
Least Absolute Deviation Method:
We
treat the same model by least absolute deviation (LAD) method because this method
is non-parametric, its indications were extracted from the theoretical relations
of parametric least squares method using its own .This work
came after a long effort and for the first time these relationships were published
(Fig.4).
Fig.4. Histogram of coefficient correlation.
The Least Absolute Deviation Method gave the true value of parameters which reverses logic of the model (Table 5).
Table 5: Diagnostic statistical for the Selected Models by Least absolut deviation (LAD) method.
Descriptors |
N_Traing |
N_Test |
R2 |
Q2 |
Q_ext2 |
H4p |
20 |
10 |
87.96 |
87.96 |
79.81 |
R_adj |
EQMC |
EQMP |
EQMP |
F |
S |
92.84 |
37.92 |
37.92 |
54.89 |
149.59 |
0.399 |
For the 20 compounds up, we
used for the training (change of the model) are well correlated with the descriptor
from where the great value of the coefficients of determination R2>50.
Enjoy Our have
model very good predictive capacities confirmed by the values of 50%. Whereas
the equality enters R2 and Q2 inform about the
robustness of the models which are ,moreover , very highly significant (high values
of the statistics F of Fisher). Besides the similarity of EQMC and EQMP means that
the capacities of prediction intern models are not too dissimilar to their capacities
of adjustment. The value of
informs
us about the validity of the model and its capacity to predict values which were
not used to generate it.
The model based on one descriptor is for equation using the
Calculation programs by MATLAB Software 15.
=-2.51+9.64*H4p
(23)
The good positive correlation (r=0.99) in Table .6 and (Fig .4)
Indicates that when H4p increases, -Log IGC50 also tends to increase.
Table 6: Correlation matrix.
|
-Log IGC50 |
H4p |
0.987 0.000 |
The
tests of Student make it possible to conclude with a risk from error from first
species from =0.05 that the parameters all are significant. Their estimates
are
=2.51 and
=9.64 we
will check this assumption graphically by distribution points for the relationship
between
values and
descriptor H4p
=F(x) (Table
7).
Table 7: Least Absolute Deviation estimates for model.
Predictor |
Coef |
SE Coef |
T |
P |
Constant |
-2.51 |
0.1593 |
-15.761 |
0.006 |
H4p |
9.643 |
0.0415 |
232.625 |
0.000 |
Where we notice from (Fig.5) (20 training,
10 test) are distribution on the line directly, which indicates the complete agreement
of 100% between the values and
the descriptor.
Positive direct proportionality between H4p descriptor and (-Log IGC50).
We note (Fig.6) from the statement that all points (Training) between the interval lines (-2.2) except for a single point outside the interval.
Fig.5. -Log IGC50 calculated vs H4p descriptor by LAD method.
The
analysis of the residues with the least absolute deviation (LAD) estimate (Fig .6)
in the training set shows that the compound n° 16 (power toxicity compound) (1-Propylamine)
highest residues and the observation (10) (1-tridecanol) is lever value () and in
the whole of validation all the points between interval (-2.2) but it is three compounds:
n° 3 (1-undecanol). n° 9 (1-nonylamine). compound n° 10
(1-decylamine) is level value (
=0.20)
Fig.6. Line plot of LAD model.
After two redoing the analysis for removing the aberrant compounds:
*(16) (power toxicity compound) (1-Propylamine) .
*20 (power toxicity compounds (1-undecylamine).
New model is Very good statistical using MATLAB Software 1:
The
standard deviation S= =0.0.303 (
), R2
=94.18%
Equation of the model using MATLAB Software:
=-2.51+9.71
*H4p (24)
We notice no change the coefficients of line after removed of aberrant value what translates the line is stable which expresses that the least absolute deviation (LAD) method does not sensitive to the presence of aberrant values thus we deduce that the least absolute deviation (LAD) method is a stable method and more robust.
There are no abnormal data compounds (aberrant) in Fig.7. Abnormal data compounds can have a healthy power on the consequences:
Training set: *10 (1-tridecanol).
Test set:
*3 (1-undecanol)
*10 (1-decylamine).
(
Fig.7. Line plot of the LAD model. (After removing Aberrant values).
Interpretation of the model:
The acute aquatic toxicity model predicts the concentration of substance that inhibits 50% of the growth (IGC50) of the population Tetrahymena pyriformis (Fig.7).
The H4p descriptor (H Autocorrelation of lag 4/weighted by atomic polarizabilities) encode information on structural fragments and therefore seem to be particularly suitable for describing in congeneric series of molecules 10.
One
descriptor was able to model the Concentration (IGC50) of Tetrahymena Pyriformis.
The value of coefficient by the Getaway descriptor H4p (6.71) in Equation (24) and
(Fig.8) for correlation coefficient (r)=0.99) and determination coefficient (95.22%)
show the regularity of the positive impact of this descriptor to the value of (-LogIGC50)
by Least Absolut Deviation (LAD) method.
Fig.8. Histogram of coefficients of regression.
CONCLUSION:
The method of least squares (L% estimate) is the best method of regression if the distribution of error is normal in the works but if the reverse to change the method of treatment by strong method of alternative for the aberrant value this which we worked on this study
Among the GETAWAY descriptor H4P (H autocorrelation of lag 4/weighted by atomic polarizabilities). selected to model the inhibition of Tetrahymena pyriformis the growth by 30 compounds (21 aliphatic alcohols and 9 amines) devised into an (20 Training, and 10 Test) on a simple linear regression used the least squares method.
To examine the normality of the residues one
validated the results by the 1.57 is not
compatibility with the normal law (not normal). negative asymmetry distribution
on the left from coefficient of skewness 2.14
, Jarque
and Bera Test which presented like a test of normality of the residues has two degrees
of freedom (42.84
deducing
that the distribution of the residues abnormal.
By changing the method of treatment by the least absolute deviation (LAD) method and because this method is non-parametric, we fell on detection of the aberrant data and when their detection is very significant, we deal with this problem, by deducing the parameter from the LAD method from the estimate of least squares.
It is noticed that for least absolut deviation
(LAD) regression: the robustness
of least absolute deviation (Lad) regression is due to its sensitivity to the presence
of aberrant values from changing the direction coefficients of the regression line,
so we cancel the anomaly point (1-Ppropylamine) and do the analysis again to make
sure that new aberrant value.
By withdrawing observation (16) (1-Propylamine
compound), we find (LAD:=2.51 and
=9.71 what
shows that the estimated parameters are stabilized around
the true values when aberrant values are removed.
In this work, the most significant effect of
the aberrant data is the impact on the coefficient of determination Where we
notice that it increased in its value by 6.22 % from the value 87.96% to 94.18%,
after removing the aberrant compounds (abnormal) (It is interpreted as a better
adjustment and that the coefficient of determination is positively affected by the
absence of aberrant values) also the value of the standard deviation of the residue.
Where it is less than the first value by 25% from the value 0.399 to the value 0.303
after removing the aberrant value.
The model is good and statistically strong.
CONFLITS OF INTEREST:
The authors declare that there are no conflicts of interest.
REFFERENCES:
1. Khadidja Bellifa, Sidi Mohamed Mekelleche.2012. QSAR study of the toxicity of nitrobenzenes to Tetrahymena pyriformis using quantum chemical descriptors. Arabian Journal of Chemistry, xxx, xxx–xxx
2. Nadia Ziani, Khadidja Amirat and Djelloul Messadi.2014 Inhibition of Tetrahymena pyriformis growth by Aliphatic Alcohols and Amines: a QSAR Study. Rev. Sci. Technol., Synthèse 29: 51-58.
3. Stefan M. Kohlbacher, Thierry Langer and Thomas Seidel.2021. QPHAR: quantitative pharmacophore activity relationship: method and validation. Journal of Cheminformatics 13, Article number: 57.
4. Sajjad Bordbar, Mostafa Alizadeh, Sayyed HojjatHashemi.2013. Effects of microstructure alteration on corrosion behavior of welded joint in API X70 pipeline steel. Materials and Design (Sciences direct) Elseiver.Vo 45,597-604.
5. Fabrizio Fratini, Patrizia Tettamanzi.2015. Corporate Governance and Performance: Evidence from Italian Companies. Open Journal of Business and Management. Vol.3 No.2.
6. Samah Anwar, Bahaa Khalil, Mohamed Seddik, Abdelhamid Eltahan, Aiman El Saadi.2022. A nonparametric statistical approach for the estimation of water quality characteristics in ungauged streams/watersheds. Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2022.128174.
7. Eriksson, L., Jaworska, J., Worth, A., Cronin, M., Mc Dowell, R.M., Gramatica, P. (2003). Methods for reliability, uncertainty assessment, and applicability evaluations of regression based and classification QSPRs. Environmental Health Perspective Journal, 111(10):1361-1375. https://doi.org/10.1289/ehp.5758.
8. Tropsha, A., Gramatica, P., Grombar, V.K. (2003). The importance of being Earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR and Combinatorial Science, 22(1): 69-76. https://doi.org/10.1002/qsar.200390007.
9. Hyperchem TM Release 6.03 for Windows, Molecular Modeling System, 2000.
10. Todeschini, R., Consonni, V., Dragon, P.M. (2006). Software for the Calculation of Molecular Descriptors. Release 5.3 for windows, Milano.
11. Todeschini, R., Ballabio, D., Consonni, V., Mauri, A., Pavan, M. (2009). MOBY DIGS software for multilinear regression analysis and variable subset selection by genetic algorithm. Release 1.1 for Windows, Milano.
12. MINITAB, Release 13.31, Statistical Software, 2000.
13. Estrada, E. and Molina, E. 2001. Novel Local (fragment-based) topological molecular descriptors for QSPR/QSAR and molecular desing.journal of Molecular graphics and modeling.20(1).54-64.doi 10.1016/S1093-3263(01)00100-0 PMID:11760003.
14. Goodarzi M, Jensens, R and Vander Heyden, y. 2012.QSRR Medeling for deverse drugs using diferent feature selection Methods coupled with linear and nonlinear regression. Journal chromatography. b. Analytical Technologies int the biomedical and life sciences. 494, doi:10.1016/j.jchromb.2012.01.012 PMID.22341354.
15. MATLAB Version 7.0.0 19920 (Release 14), The language of Technical Computiong. The Math Works. Inc. May 06(2004).
16. Zeeman M., Aver C.M., Clements R.G., Nabholtz J.V. and Boethling R.S., 1995. U.S. EPA Regulatory Perspectives on the use of QSAR for new and existing chemical evaluations SAR QSAR, Environmental. Research, Vol. 3(3),179-201.
17. Walker J.D., 2003. Applications of QSARs in toxicology: a US Government perspective, Journal of Molecular Structure - Theochem, Vol. 622(1-2), 167-184.
18. Bradbury S.P., Russon C.L., Ankley G.T., Schultz T.W. and Walker J.D., 2003. Overview of data and conceptual approaches for derivation of Quantitative Structure –Activity Relationships, for ecotoxicological effects of organic chemicals Environmental Toxicology and Chemistry, Vol. 22 (8), 1789-1798.
19. European Commission. White Paper on a strategy for a future Community Policy for Chemicals., 2001.http: // europa .eu.int / comm / enterprise / reach /.
20. Toussaint M.W., Shedd T.R., Van der Schalie W.H. and Leather G.R., 1995. A comparison of standard acute toxicity tests with rapid screening toxicity tests. Environmental Toxicology and Chemistry Vol. 14(5), 907-915.
21. Kubinyi H., 2002. From Narcosis to Hyperspace: The History Of QSAR, Quantitative Structure.-Activity Relationships., Vol. 21(4), 348-356.http:// e c b .j r c.i t / QSAR /.
22. Schultz T.W., Cronin M.T.D., Walker J.D. and Aptula A.O., 2003.Quantitative structure –activity relationships (QSARs) in toxicology: a historical perspective, Journal of Molecular Structure – Theochem, Vol.622(1-2), 1-22.
23. Posthumus R. and Slooff W., 2001. Implementation of QSARs in ecotoxicological risk assessments RIVM report. 601516003.
24. Dearden J.C., 2002. Prediction of Environmental Toxicity and Fate Using Quantitative Structure – Activity Relationschips (QSARs), Journal of Brazilian Chemical Society, Vol 13 (6), 754- 762.
25. Schultz T.W., Cronin M.T.D. and Netzeva T.I., 2003. The present status of QSAR in toxicology, Journal of Molecular Structure -Theochem. Vol. 622 (1- 2), 23-38.
26. Cronin M.T.D. and Dearden J.C., 1995. QSAR in toxicology. Prediction of Aquatic Toxicity, Quantitative Structure. -Activity Relationship Vol.14(1), 1-7.
27. Mannhold R. and van de Waterbeemdt H., 2001.Substructure and whole molecule approaches for calculating logP, Journal of Computer- Aided Molecular Design, Vol. 15(4), 337-354.
28. Mannhold R. and Rekker R.F., 2000. The hydrophobic fragmental constant approach for calculating logP in octanol/water and aliphatic hydrocarbon/water systems. Perspectives in Drug Discovery and Design, Vol.18(1), 1-18.
29. Benfenati E., Gini G., Piclin N., Roncaglioni A. and Vari M.R., 2003.Predicting log P of pesticides using different software, Chemosphere, Vol.53(9), 1155-1164.
30. Klopman G., Li J.K., Wang S. and Dimayuga M., 1994.Computer Automated log P calculations based on an extended group contribution approach, Journal of Chemical. Information Computer Sciences, Vol.34(4),752-781.
31. Kaiser K.L.E., 2003. The use of neural networks in QSARs for acute aquatic toxicological endpoints, Journal of.Molecular Structure. Theochem, Vol .622(1-2), 85-95.
32. Papa E., Villa F. and Gramatica P., 2005. Statistically Validated QSARs Based on Theoretical Descriptors , for Modeling Aquatic Toxicity of Organic Chemicals in Pemiphales promelas (Fathead Minnow ), Journal of Chemical Information and Modeling, Vol.45(5), 1256-1266.
33. Roy K. and Ghosh G., 2009.QSTR with extended topochemical atom (ETA) indices. 12. QSAR for the toxicity of diverse aromatic compounds to Tetrahymena pyriformis using chemometric tools. Chemosphere, Vol. 77(7), 999-1009.
34. Zhao Y.H., Zhang X.J., WEN Y., Sun F.T., Guo Z., Qin W.C., Qin H.W.,Xu J.L., Sheng L.X. and Abraham M.H., 2010.Toxicity of organic chemicals to Tetrahymena pyriformis: Effect of polarity and ionization on toxicity. Chemosphere, Vol. 79(1), 72-77.
35. Roy K. and Das R.N., 2010.QSTR with extended topochemical atom (ETA) indices.14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis. Journal of Hazardous Materials, Vol. 183(1-3), 913-922.
36. Bouaoune A., Lourici L., Haddag H. and Messadi D.,2012. Inhibition of Microbial Growth by anilines: A QSAR study, Journal of Environmental Science and Engineering., A1, Vol. 1(5A), 663-671.
37. Hill D.L., 1972. The Biochemistry and Physiology of Tetrahymena. Academic Press, New York and London,230p.
38. Schultz T.W., Lin D.T., Wilke T.S. and Arnold L.M.,1990. Quantitative structure-activity relationsh
39. Tiffany Machabert .2014 "Modèles en très grande dimension avec des outliers. Théorie, simulations, applications" paris.
40. Soner Çankaya, Samet Hasan Abacı.2015. A Comparative Study of Some Estimation Methods in Simple Linear Regression Model for Different Sample Sizes in Presence of Outliers. Turkish Journal of Agricultue Food Science and Technology. ISSN: 2148-127X.
41. Jiehan Zhu and Ping Jing.2010. The Analysis of Bootstrap Method in Linear Regression Effect. Journal of Mathematics Research Vol. 2, No. 4.
42. Yinbo Li and Gonzalo R. Arce.2004. AMaximum Likelihood Approach to Least Absolute Deviation Regression. EURASIP Journal on Applied Signal Processing. 12, 1762–1769.
43. Gonzalez, M.P, Teran, C., Saiz-Urra.I and Tcijcira.M.2008. Variable selection Methods in QSAR overview. currrent Topics in medicinal chemistry.8(18), 16061627.doi:102174/156802608786552PMID:19075770.
44. Roman Kaliszan, Tomasz Ba̧czek, Adam Buciński, Bogusław Buszewski, Małgorzata Sztupecka. 2003. Prediction of gradient retention from the solvent strength (LSS) model, quantitative structure-retention relationships (QSRR), and artificial neural networks (ANN). Journal of Separation Science. Volume 26, Issue 3-4.
45. Berlin, G.B. 1982 The Pyrazine; Wiley-Interscience: New York.
46. Pynnönen, Seppo and Timo Salmi (1994). A Report on Least Absolute Deviation Regression with Ordinary Linear Programming. Finnish Journal of Business Economics 43:1, 33-49.
47. Dodge, Y. et Valentin Rousson (2004). Analyses de regression appliquée. paris.
48. Faria, S. and Melfi, G. (2006). Lad regression and nonparametric methods for detecting outliers and leverage points. Student, 5 :265– 272.
49. Gabriela Ciuperca. (2009). Estimation robuste dans un modè paramétrique avec rupture. Bordeaux.
50. Gilbert Saporta. (2012). Régression robuste.
51. Ndèye Niang- Gilbert Saporta. (2014).Régression robuste Régression non-paramétrique.
52. Dr. Nadia H. AL – Noor and Asmaa A. Mohammad. 2013. Model of Robust Regression with Parametric and Nonparametric Methods. Journal of Mathematical Theory and Modeling Vol.3, No.5.
53. Dodge, Y. (2004). Statistique: Dictionnaire encyclopédique.
54. Dodge, Y. and Jureckova, J. (2000). Adaptive Regression. Springer-Verlag New York.
55. Nornadiah, Mohd Razali.Yab Bee,Yah .2011. Power Comparaisons of shapiro-wilk, Kolmogorov- smornov, lillieffors and Anderson-Darling tests, Journal of statistique Modelling and analytics .vol 2 No 1:21-33 .
Received on 20.09.2022 Modified on 11.02.2023
Accepted on 21.04.2023 ©AJRC All right reserved
Asian J. Research Chem. 2023; 16(3):195-204.
DOI: 10.52711/0974-4150.2023.00031